Topic Stability over Noisy Sources
نویسندگان
چکیده
Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise will have diverse effects on the stability of different topic models. From these observations, we propose guidelines for text corpus generation, with a focus on automatic speech transcription. We also suggest topic model selection methods for noisy corpora.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملMulti-Label Classification from Multiple Noisy Sources Using Topic Models
Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. Our f...
متن کاملEstimation of Source Location Using Curvature Analysis
A quadratic surface can be fitted to potential-field data within 3×3 windows, which allow us to calculate curvature attributes from its coefficients. Phillips (2007) derived an equation depending on the most negative curvature to obtain the depth and structural index of isolated sources from peak values of special functions. They divided the special functions into two categories: Model-specific...
متن کاملStable Super-Resolution of Positive Sources: the Discrete Setup
In single-molecule microscopy it is necessary to reconstruct a signal that consists of positive point sources from noisy observations of the spectrum of the signal in the low-frequency band [−fc, fc]. It is shown that the problem can be solved using convex optimization in a stable fashion. The stability of reconstruction depends on Rayleigh-regularity of the support of the signal, i.e., on how ...
متن کاملHow Noisy Social Media Text, How Diffrnt Social Media Sources?
While various claims have been made about text in social media text being noisy, there has never been a systematic study to investigate just how linguistically noisy or otherwise it is over a range of social media sources. We explore this question empirically over popular social media text types, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016